Homework 4 - Least-squares parameter estimation

Due Friday October 18 at end-of-day

The model structure used is the so-called ARX (Auto-Regressive with eXogenous input) model: where $u(k)$ is a known input signal, $y(k)$ is the output signal and $e(k)$ is a disturbance (an unknown input signal) in the form of a zero-mean white noise sequence.

The model can be written \begin{align*} A(\text{q}) y(k) &= B(\text{q})u(k) + \text{q}^n e(k)\\ (\text{q}^n + a_1\text{q}^{n-1} + \cdots + a_n)y(k) &= (b_0\text{q}^{m} + b_1\text{q}^{m-1} + \cdots + b_m)u(k) + \text{q}^n e(k)\\ y(k+n) + a_1 y(k+n-1) + \cdots + a_n y(k) &= b_0u(k+m) + b_1u(k+m-1) + \cdots + b_m u(k) + e(k+n)\\ y(k+1) + a_1y(k) + \cdots + a_n y(k-n+1) &= b_0 u(k+m-n+1) + b_1u(k+m-n) + \cdots + b_m u(k-n+1)) + e(k+1) \end{align*} The one-step-ahead predictor for this model becomes \begin{align*} \hat{y}(k+1) &= -a_1 y(k) - a_2 y(k-1) - \cdots - a_n y(k-n+1) \\ &\qquad + b_0 u(k+m-n+1) + b_1 u(k+m-n) + \cdots + b_m u(k-n+1)\\ & = \underbrace{\begin{bmatrix} -y(k) & \cdots & -y(k-n+1) & u(k+m-n+1) & \cdots & u(k-n+1)\end{bmatrix}}_{\varphi^{T}(k+1)} \underbrace{\begin{bmatrix} a_1\\\vdots\\a_n\\b_0\\\vdots\\b_m\end{bmatrix}}_{\theta}\\ &= \varphi^{T}(k+1)\theta. \end{align*} Note that the white noise term $e(k+1)$ by definition cannot be predicted from knowledge of previous values in the sequence (which we don't know) nor from previous output values $y(t), \; t \le k$ (which could have been used to estimate $\hat{e}(k)$). Therefore $e(k+1)$ is predicted by its mean value which is zero. Note also that if our model with $\theta = \theta^*$ is perfect ($\theta^*$ contains the true parameters for the system which generated the data), then the prediction error equals the white noise disturbance: $\epsilon(k+1) = y(k+1) - \varphi^{T}(k+1)\theta^* = e(k+1)$. Therefore, we can check how good a models is by testing how close the prediction errors resembles a white noise sequence.

The system of equations in the unknown system parameters $\theta$ is $ \Phi \theta = y, $ where \begin{align*} \Phi &= \begin{bmatrix} \varphi^{T}(n+1)\\\varphi^{T}(n+2)\\\vdots\\\varphi^{T}(N)\end{bmatrix},\\ y &= \begin{bmatrix} y(n+1)\\y(n+2)\\\vdots\\y(N)\end{bmatrix}. \end{align*}

The least-squares solution to this system of equations is, by definition, the solution $\hat{\theta}$ which minimizes the sum of squares of the residuals $\epsilon = y-\Phi\theta$, i.e. the solution that minimizes the criterion $ J(\theta) = \epsilon^{T}\epsilon = \sum_i \epsilon_i^2. $ It is given by $ \hat{\theta}_{LS} = \underbrace{(\Phi^{T}\Phi)^{-1}\Phi^{T}}_{\Phi^+} y, $ where $\Phi^+$ is called the Moore-Penrose invers of the (typically) non-square, tall matrix $\Phi$.



In [ ]:

Instructions

Download the data: http://alfkjartan.github.io/files/sysid_hw_data.mat. The data consist of 4 vectors u1, y1, u1_val, y1_val.
Estimate the parameters of some low-order systems. Try at least
1. One pole, one zero, one delay
2. Two poles, one zero, two delays
3. Four poles, three zeros
4. Five poles, four zeros
Validate each model using the validation data.
Decide on the best model and motivate your choice.
Write a short report and include validation output for the different models



In [1]:

    
import numpy as np
import scipy.io as sio
import matplotlib.pyplot as plt
import control

Load the data



In [7]:

    
!wget https://alfkjartan.github.io/files/sysid_hw_data.mat
data = sio.loadmat("sysid_hw_data.mat")









    



--2019-10-10 15:48:09--  https://alfkjartan.github.io/files/sysid_hw_data.mat
Resolving alfkjartan.github.io (alfkjartan.github.io)... 185.199.110.153, 185.199.108.153, 185.199.109.153, ...
Connecting to alfkjartan.github.io (alfkjartan.github.io)|185.199.110.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12935 (13K) [application/octet-stream]
Saving to: ‘sysid_hw_data.mat’

sysid_hw_data.mat   100%[===================>]  12.63K  --.-KB/s    in 0s      

2019-10-10 15:48:09 (37.8 MB/s) - ‘sysid_hw_data.mat’ saved [12935/12935]

Plot the data



In [8]:

    
N = len(data["u1"])
plt.figure(figsize=(14,1.7))
plt.step(range(N),data["u1"])
plt.ylabel("u_1")
plt.figure(figsize=(14,1.7))
plt.step(range(N),data["y1"])
plt.ylabel("y_1")
data["u1"].size









    Out[8]:





800

Identify first order model

Consider the model structure $$y(k) = \frac{b_0\text{q}+b_1}{\text{q}+a} \text{q}^{-1} u(k),$$ which is a first order model with one zero, one pole and one delay. The true system has $b_0=0.2$, $b_1=0$ and $a=-0.8$.

The ARX model can be written $$ y(k+1) = -ay(k) + b_0u(k) + b_1u(k-1) + e(k+1),$$ and so the one-step-ahead predictor becomes $$ \hat{y}(k+1) = -ay(k) + b_0u(k) + b_1u(k-1) = \begin{bmatrix} -y(k) & u(k) & u(k-1) \end{bmatrix}\begin{bmatrix} a\\b_0\\b_1 \end{bmatrix}. $$

The systems of equations becomes $$ \underbrace{\begin{bmatrix} -y(2) & u(2) & u(1)\\-y(3) & u(3) & u(2)\\ \vdots & \vdots & \vdots\\ -y(N-1) & u(N-1) & u(N-2) \end{bmatrix}}_{\Phi} \underbrace{\begin{bmatrix} a\\b_0\\b_1\\\end{bmatrix}}_{\theta} = \begin{bmatrix} y(3)\\y(4)\\\vdots\\y(N) \end{bmatrix},$$ which is solved using least squares.



In [9]:

    
y = np.ravel(data["y1"])
u = np.ravel(data["u1"])
Phi = np.array([-y[1:N-1], 
                u[1:N-1], 
                u[:N-2]]).T
yy = y[2:]
theta_ls = np.linalg.lstsq(Phi, yy)
theta_ls
print("Estimated: a = %f" % theta_ls[0][0])
print("Estimated: b_0 = %f" % theta_ls[0][1])
print("Estimated: b_1 = %f" % theta_ls[0][2])









    



Estimated: a = -0.998051
Estimated: b_0 = -0.002461
Estimated: b_1 = 0.001065

Validation

The validation is always done one input-output data that were not used in the parameter estimation. It is also recommended to use a k-step-ahead predictor. See section about validation further down.



In [10]:

    
# Import the predict_lti function which will calculate the k-step ahead prediction.
from lti_fcns import predict_lti
yv = np.ravel(data["y1_val"])
uv = np.ravel(data["u1_val"])
k = 8 # The prediction horizon
d = 1 # The input delay of the system
a = np.array([1, theta_ls[0][0]])
b = np.ravel(theta_ls[0][1:])
(ypred, tpred) = predict_lti(b,a,yv, uv, k, d)
N = len(uv)
plt.figure(figsize=(14,3))
plt.step(range(N), yv)
plt.plot(tpred, ypred, 'ro')

# Calculate the Root Mean Square Error (RMSE) and fit (in %)
err = yv[tpred[0]:] - ypred
RMSE = np.sqrt(1.0/N * np.sum( np.square(yv[tpred[0]:] - ypred)))
fit = 100 * (1 - np.linalg.norm(err)/np.linalg.norm(yv - np.mean(yv)))
plt.title("RMSE = %f,  fit = %f %%" % (RMSE, fit))









    



Order n=2
Data points N=800






    Out[10]:





Text(0.5,1,'RMSE = 0.339753,  fit = -1.792116 %')

Estimate the other models on your own



In [ ]:

About the validation simulation

In general it is preferred to compare the validation output to the output from a k-step ahead predictor, where k is chosen to correspond to a typical time interval of the data. For control models where we have used the rule-of-thumb of 4-10 sampling periods per rise time, a choice of around $k=10$ is reasonable. A choice of $k=1$ (one-step ahead predictor) gives unreliable validation, since the trivial predictor $\hat{y}(k+1) = y(k)$ can give good predictions if the sampling period is short. Think of the prediction "the weather in one minute from now will be equal to the weather now". Choosing k extremely large corresponds to a pure simulation of the model (the predicted output sequence depends only on the input sequence) and will not work for unstable models. Also, for models with integration a small bias in the input sequence will be accumulated and give poor validation result, even for a good model.